Machine Learning: Context

Peter Freeman (2019 SLSW)

8 July 2019

What is Machine Learning?

The short version:

The longer version:

Machine Learning: the Broader Context

Machine Learning: Which Algorithm is Best?

That’s not actually the right question to ask.

(And the answer is not deep learning. Because if the underlying relationship between your predictors and your response is truly linear, you do not need to apply deep learning! Just do linear regression. Really. It’s OK.)

The right question is ask is: why should I try different algorithms?

The answer to that is that without superhuman powers, you cannot visualize the distribution of predictor variables in their native space. (Of course, you can visualize these data in projection…this a point we will return to when we discuss exploratory data analysis.) And the performance of different algorithms will be predicated on how predictor data are distributed…

Data Geometry

The picture above shows data for which there are two predictor variables (along the x-axis and the y-axis) and for which the response variable is binary: x’s and o’s. An algorithm that utilizes linear boundaries or segments the plane into rectangles will do well given the data to the left, whereas an algorithm that utilizes circular boundaries will fare better given the data to the right.

“do well/fare better”: will do a better job at predicting whether a new datum is actually an x or an o.